Ê P p P f Í Ṣ ṣ Ṣ ž? ˆ Š š Š č, ǰ. œ BI bi BI be. œ LIḄA lebba heart RḄH rabba great

ISO/IEC JTC1/SC2/WG2 N3485R L2/08-270R 2008-08-04 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация по стандартизации Doc Type: Working Group Document Title: Proposal for encoding the Mandaic script in the BMP of the UCS Source: UC Berkeley Script Encoding Initiative (Universal Scripts Project) Authors: Michael Everson Status: Individual Contribution Action: For consideration by JTC1/SC2/WG2 and UTC Date: 2008-08-04 1. Introduction. The Mandaic script is used to write a dialect of Eastern Aramaic, which, in its classical form, is currently used as the liturgical language of the Mandaean religion. A living language descended from Classical Mandaic is spoken by a small number of people living in and around Ahvaz, Khūzestān, in southwestern Iran; speakers are also found in emigrant communities in Sweden, Australia, and the United States. There is a considerable amount of Iranian influence on the lexicon of Classical Mandaic, and Arabic and Persian influence on the grammar and lexicon of the contemporary dialect. The script itself is likely derived from the Parthian chancery script. 2. Structure. Mandaic is a right-to-left script. It is a true alphabet, using letters regularly for vowels rather than as the matres lectionis from which they derived. The three diacritical marks are used in teaching materials to differentiate vowel quality. At present, at least, the rule is that they may be omitted from ordinary text. In this regard they are very like the Arabic fatha, kasra, and damma or the Hebrew vowel points. The AFFRICATION MARK @ is used to extend the character set for foreign sounds (whether affrication, lenition, or another sound). See figures 8 and 9. Compare: G g G γ D d D δ H h H ḥ Ṭ ṭ Ṭ ẓ K k K χ Ê P p P f Í Ṣ ṣ Ṣ ž? ˆ Š š Š č, ǰ T t T θ The VOCALIZATION MARK @ is used to distinguish vowel quality of halqa, ušenna, and aksa (Hebrew alef, waw, yod): BA bā BA ba BU bu BU bo œ BI bi BI be The GEMINATION MARK @ is used to indicate what native writers call a hard pronunciation: AḲA ekka there is Œˆ ŠIṆA šenna tooth, œ LIḄA lebba heart RḄH rabba great 1

3. Joining behaviour. Mandaic has fully-developed joining behaviour. The table below shows the joining forms. Dual-joining Mandaic Characters Character X n X r X m X l AB AG AD AH USHENNA ƒ IT Â ATT À Ã Õ AK AL AM ÿ Ÿ AN fi fl AS AP Ê Á Ë È ASZ Í Î Ï Ì AQ Ó Ô ˇ Ò AR Ú Û Ù ı AT Right-joining Mandaic Characters Character X n X r HALQA AZ Δ «IN» AKSA Œ œ ASH ˆ Non-joining Mandaic Characters Character X n DUSHENNA ࡖ KAD ø AIN ࡘ ࡖ U+0856 MANDAIC LETTER DUSHENNA (also called adu), transliterated ḏ, is derived from an old ligature of U+0843 MANDAIC LETTER AD d and U+0849 Œ MANDAIC LETTER AKSA i, used in Aramaic to write the relative pronoun and the genitive exponent di. It is used as an undecomposable letter in its own right (like Danish æ), usually written proclitically as the first letter of a word, or written alone. Note that AD and AKSA ligate normally: œ di = Œ i + d (reading right to left). Similarly, ࡗ U+0857 MANDAIC LETTER KAD is derived from an old ligature of U+084A ࡊ MANDAIC LETTER AK k and ࡖ U+0856 MANDAIC LETTER DUSHENNA di, used in Aramaic to write the word kḏ when, as, like ; compare Hebrew ÚÏÙ kədi. It is also used as an undecomposable letter in its own right, usually written alone. While AK is dual joining, KAD never joins with a preceding character; the joining behaviour is different between the two despite the origin of KAD in AK + DUSHENNA. Compare È ࡖ pkḏ = ࡖ ḏ + ࡊ k + Ê p (where AP and AK join as normal and AK and DUSHENNA do not join) with Êࡗ pkḏ = ࡗ kḏ + Ê p (where AP and KAD do not join). Note also the similarity of KAD and the syllable ki: ࡉ ki = Œ i + ࡊ k (reading right to left). 2

U+0858 ࡘ MANDAIC LETTER AIN is a borrowing from U+0639 π ARABIC LETTER AIN and as noted above does not combine with other characters. 4. Punctuation. Sentence punctuation is used, rather sparsely. Two script-specific marks are used; U+085D MANDAIC SMALL PUNCTUATION represents a minor break (analogous to a comma), and U+085E MANDAIC LARGE PUNCTUATION represents a major break (analogous to a full stop). In legacy fonts these are encoded on COMMA and FULL STOP respectively. 5. Collating order. The order of the letters in the code chart is their alphabetical order. The diacritical marks do not affect primary weights but are taken into consideration in tie-breaking. 6. Character names. The transliteration in the character names follows the usual UCS naming con - ventions, although ASZ has been preferred to what might have been written with ASS. 7. Linebreaking. Line-breaking properties for Mandaic are the same as those for Syriac. To justify text, U+085F MANDAIC KASHIDA is quite often used. In legacy fonts this is encoded on LOW LINE. The characters U+ 0640 ARABIC TATWEEL, U+180A MONGOLIAN NIRUGU, and U+ 07FA NKO LAJANYALAN suggest to us that script-specific encoding is appropriate for scripts which use extenders of this kind. Accordingly, the N Ko, Mongolian, or Arabic characters (which have N Ko, Mongolian, and Arabic script properties) should not be used for Mandaic (or for Manichaean, or for Psalter Pahlavi). 8. Unicode Character Properties. 0840;MANDAIC LETTER HALQA;Lo;0;R;;;;;N;;;;; 0841;MANDAIC LETTER AB;Lo;0;R;;;;;N;;;;; 0842;MANDAIC LETTER AG;Lo;0;R;;;;;N;;;;; 0843;MANDAIC LETTER AD;Lo;0;R;;;;;N;;;;; 0844;MANDAIC LETTER AH;Lo;0;R;;;;;N;;;;; 0845;MANDAIC LETTER USHENNA;Lo;0;R;;;;;N;;;;; 0846;MANDAIC LETTER AZ;Lo;0;R;;;;;N;;;;; 0847;MANDAIC LETTER IT;Lo;0;R;;;;;N;;;;; 0848;MANDAIC LETTER ATT;Lo;0;R;;;;;N;;;;; 0849;MANDAIC LETTER AKSA;Lo;0;R;;;;;N;;;;; 084A;MANDAIC LETTER AK;Lo;0;R;;;;;N;;;;; 084B;MANDAIC LETTER AL;Lo;0;R;;;;;N;;;;; 084C;MANDAIC LETTER AM;Lo;0;R;;;;;N;;;;; 084D;MANDAIC LETTER AN;Lo;0;R;;;;;N;;;;; 084E;MANDAIC LETTER AS;Lo;0;R;;;;;N;;;;; 084F;MANDAIC LETTER IN;Lo;0;R;;;;;N;;;;; 0850;MANDAIC LETTER AP;Lo;0;R;;;;;N;;;;; 0851;MANDAIC LETTER ASZ;Lo;0;R;;;;;N;;;;; 0852;MANDAIC LETTER AQ;Lo;0;R;;;;;N;;;;; 0853;MANDAIC LETTER AR;Lo;0;R;;;;;N;;;;; 0854;MANDAIC LETTER ASH;Lo;0;R;;;;;N;;;;; 0855;MANDAIC LETTER AT;Lo;0;R;;;;;N;;;;; 0856;MANDAIC LETTER DUSHENNA;Lo;0;R;;;;;N;;;;; 0857;MANDAIC LETTER KAD;Lo;0;R;;;;;N;;;;; 0858;MANDAIC LETTER AIN;Lo;0;R;;;;;N;;;;; 0859;MANDAIC AFFRICATION MARK;Mn;220;NSM;;;;;N;;;;; 085A;MANDAIC VOCALIZATION MARK;Mn;220;NSM;;;;;N;;;;; 085B;MANDAIC GEMINATION MARK;Mn;220;NSM;;;;;N;;;;; 085D;MANDAIC SMALL PUNCTUATION;Po;0;ON;;;;;N;;;;; 085E;MANDAIC LARGE PUNCTUATION;Po;0;ON;;;;;N;;;;; 085F;MANDAIC KASHIDA;Lm;0;R;;;;;N;;;;; 9. Bibliography. Al-Mubaraki, Brayan Majid, and Majid Fandi Al-Mubaraki. 2006. A Mandaic dictionary: Mandaic- English. Sydney, Australia: Majid Fandi Al-Mubaraki. Daniels, Peter T., and William Bright, eds. 1996. The world s writing systems. New York; Oxford: Oxford University Press. ISBN 0-19-507993-0 3

Faulmann, Carl. 1990 (1880). Das Buch der Schrift. Frankfurt am Main: Eichborn. ISBN 3-8218-1720-8 Haarmann, Harald. 1990. Die Universalgeschichte der Schrift. Frankfurt: Campus. ISBN 3-593-34346-0 Macuch, R, & E. S. Drower. 1963. A Mandaic dictionary. Oxford: Clarendon Press. 10. Acknowledgements. This project was made possible in part by a grant from the U.S. National Endowment for the Humanities, which funded the which funded the Universal Scripts Project (part of the Script Encoding Initiative at UC Berkeley) in respect of the Mandaic encoding. 4

0840 Mandaic 085F 0 1 2 3 4 5 6 7 8 9 A B C D E F 084 085 ࡐ ࡀ 0840 0850 ࡑ ࡁ 0841 0851 ࡒ ࡂ 0842 0852 ࡓ ࡃ 0843 0853 ࡔ ࡄ 0844 0854 ࡕ ࡅ 0845 0855 ࡖ ࡆ 0846 0856 ࡗ ࡇ 0847 0857 ࡘ ࡈ 0848 0858 $ ࡉ 0849 0859 $ ࡊ 084A 085A $ ࡋ 084B 085B ࡌ 084C ࡍ 084D 085D ࡎ 084E 085E ࡏ 084F 085F Letters 0840 ࡀ MANDAIC LETTER HALQA a 0841 ࡁ MANDAIC LETTER AB 0842 ࡂ MANDAIC LETTER AG 0843 ࡃ MANDAIC LETTER AD 0844 ࡄ MANDAIC LETTER AH 0845 ࡅ MANDAIC LETTER USHENNA u 0846 ࡆ MANDAIC LETTER AZ 0847 ࡇ MANDAIC LETTER IT 0848 ࡈ MANDAIC LETTER ATT 0849 ࡉ MANDAIC LETTER AKSA i 084A ࡊ MANDAIC LETTER AK 084B ࡋ MANDAIC LETTER AL 084C ࡌ MANDAIC LETTER AM 084D ࡍ MANDAIC LETTER AN 084E ࡎ MANDAIC LETTER AS 084F ࡏ MANDAIC LETTER IN 0850 ࡐ MANDAIC LETTER AP 0851 ࡑ MANDAIC LETTER ASZ 0852 ࡒ MANDAIC LETTER AQ 0853 ࡓ MANDAIC LETTER AR 0854 ࡔ MANDAIC LETTER ASH 0855 ࡕ MANDAIC LETTER AT 0856 ࡖ MANDAIC LETTER DUSHENNA di 0857 ࡗ MANDAIC LETTER KAD 0858 ࡘ MANDAIC LETTER AIN Diacritics 0859 $ MANDAIC AFFRICATION MARK 085A $ MANDAIC VOCALIZATION MARK 085B $ MANDAIC GEMINATION MARK Punctuation 085D MANDAIC SMALL PUNCTUATION 085E MANDAIC LARGE PUNCTUATION Letter extender 085F MANDAIC KASHIDA Printed using UniBook (http://www.unicode.org/unibook/) Date: 2008-08-04 5

Figures Figure 1. Chart with transliterations into Latin and Hebrew from Macuch 1963. DUSHENNA and KAD are given at the end. Figure 2. Chart showing basic syllables in Mandaic. 6

Figure 3. Sample text in Mandaic. Figure 4. Sample text in Mandaic. Note the frequent use of extensions with U+085F MANDAIC KASHIDA and the use of U+085E MANDAIC LARGE PUNCTUATION. 7

Figure 5. Sample text in Mandaic. Note the use of U+085F MANDAIC KASHIDA and the frequent use U+085D MANDAIC SMALL PUNCTUATION. Figure 6. Sample text in Mandaic. Figure 7. Sample of Mandaic text from Daniels 1996. 8

Figure 8. Sample text in Mandaic showing extended letters for foreign sounds. This is a modern letter not an old manuscript. 9

Figure 9. Sample text in Mandaic showing extended letters for foreign sounds. This is a modern letter not an old manuscript. 10

Figure 10. Sample text in Mandaic. Figure 11. Sample text in Mandaic. 11

Figure 12. Chart of Mandaic letters and ligatures from Faulmann 1880. The specific glyphs of these ligatures would be font-specific; they follow the normal shaping rules as in a standard font: ࡖ dušenna fl nd Ì ṣu kd fl nu Ì ṣl Û kr œfl ni ÛÌ ṣr œ ki fl nt Ôfl nq kl È pu ƒ ut ku È pl š kt ÛÈ pr 12

Figure 13. Sample text in Mandaic from the Imprimerie Nationale. 13

A. Administrative 1. Title Proposal for encoding the Mandaic script in the BMP of the UCS 2. Requester s name Michael Everson 3. Requester type (Member body/liaison/individual contribution) Indi v i dual co ntri buti o n. 4. Submission date 2008-08-04 5. Requester s reference (if applicable) 6. Choose one of the following: 6a. This is a complete proposal 6b. More information will be provided later B. Technical General 1. Choose one of the following: 1a. This proposal is for a new script (set of characters) 1b. Proposed name of script Mandaic. 1c. The proposal is for addition of character(s) to an existing block 1d. Name of the existing block 2. Number of characters in proposal 31. 3. Proposed category (A-Contemporary; B.1-Specialized (small collection); B.2-Specialized (large collection); C-Major extinct; D- Attested extinct; E-Minor extinct; F-Archaic Hieroglyphic or Ideographic; G-Obscure or questionable usage symbols) Category B.1. 4a. Is a repertoire including character names provided? 4b. If YES, are the names in accordance with the character naming guidelines in Annex L of P&P document? 4c. Are the character shapes attached in a legible form suitable for review? 5a. Who will provide the appropriate computerized font (ordered preference: True Type, or PostScript format) for publishing the standard? Brian Mubaraki and Michael Everson. 5b. If available now, identify source(s) for the font (include address, e-mail, ftp-site, etc.) and indicate the tools used: Michael Everson, Fontographer. 6a. Are references (to other character sets, dictionaries, descriptive texts etc.) provided? 6b. Are published examples of use (such as samples from newspapers, magazines, or other sources) of proposed characters attached? 7. Does the proposal address other aspects of character data processing (if applicable) such as input, presentation, sorting, searching, indexing, transliteration etc. (if yes please enclose information)? 8. Submitters are invited to provide any additional information about Properties of the proposed Character(s) or Script that will assist in correct understanding of and correct linguistic processing of the proposed character(s) or script. Examples of such properties are: Casing information, Numeric information, Currency information, Display behaviour information such as line breaks, widths etc., Combining behaviour, Spacing behaviour, Directional behaviour, Default Collation behaviour, relevance in Mark Up contexts, Compatibility equivalence and other Unicode normalization related information. See the Unicode standard at http://www.unicode.org for such information on other scripts. Also see Unicode Character Database http://www. unicode. org/public/unidata/ UnicodeCharacterDatabase.html and associated Unicode Technical Reports for information needed for consideration by the Unicode Technical Committee for inclusion in the Unicode Standard. See above. C. Technical Justification 1. Has this proposal for addition of character(s) been submitted before? If YES, explain. 2a. Has contact been made to members of the user community (for example: National Body, user groups of the script or characters, other experts, etc.)? 2b. If YES, with whom? Brian Mubaraki, Charles Häberl, Jorunn Buckley, William Clocksin 2c. If YES, available relevant documents 14

3. Information on the user community for the proposed characters (for example: size, demographics, information technology use, or publishing use) is included? There are some 60,000-70,000 Mandaeans worldwide. 4a. The context of use for the proposed characters (type of use; common or rare) Traditional and liturgical use. 4b. Reference 5a. Are the proposed characters in current use by the user community? 5b. If YES, where? In Iran, Iraq, and elsewhere. 6a. After giving due considerations to the principles in the P&P document must the proposed characters be entirely in the BMP? 6b. If YES, is a rationale provided? 6c. If YES, reference Contemporary use and accordance with the Roadmap. 7. Should the proposed characters be kept together in a contiguous range (rather than being scattered)? 8a. Can any of the proposed characters be considered a presentation form of an existing character or character sequence? 8b. If YES, is a rationale for its inclusion provided? 8c. If YES, reference 9a. Can any of the proposed characters be encoded using a composed character sequence of either existing characters or other proposed characters? 9b. If YES, is a rationale for its inclusion provided? 9c. If YES, reference 10a. Can any of the proposed character(s) be considered to be similar (in appearance or function) to an existing character? 10b. If YES, is a rationale for its inclusion provided? 10c. If YES, reference 11a. Does the proposal include use of combining characters and/or use of composite sequences (see clauses 4.12 and 4.14 in ISO/IEC 10646-1: 2000)? 11b. If YES, is a rationale for such use provided? 11c. If YES, reference 11d. Is a list of composite sequences and their corresponding glyph images (graphic symbols) provided? 11e. If YES, reference 12a. Does the proposal contain characters with any special properties such as control function or similar semantics? 12b. If YES, describe in detail (include attachment if necessary) 13a. Does the proposal contain any Ideographic compatibility character(s)? 13b. If YES, is the equivalent corresponding unified ideographic character(s) identified? 15